Compression and the origins of Zipf's law for word frequencies
نویسنده
چکیده
Here we sketch a new derivation of Zipf’s law for word frequencies based on optimal coding. The structure of the derivation is reminiscent of Mandelbrot’s random typing model but it has multiple advantages over random typing: (1) it starts from realistic cognitive pressures (2) it does not require fine tuning of parameters and (3) it sheds light on the origins of other statistical laws of language and thus can lead to a compact theory of linguistic laws. Our findings suggest that the recurrence of Zipf’s law in human languages could originate from pressure for easy and fast communication.
منابع مشابه
Zipf’s Law for Word Frequencies: Word Forms versus Lemmas in Long Texts
Zipf's law is a fundamental paradigm in the statistics of written and spoken natural language as well as in other communication systems. We raise the question of the elementary units for which Zipf's law should hold in the most natural way, studying its validity for plain word forms and for the corresponding lemma forms. We analyze several long literary texts comprising four languages, with dif...
متن کاملThe origins of Zipf's meaning-frequency law
In his pioneering research, G. K. Zipf observed that more frequent words tend to have more meanings, and showed that the number of meanings of a word grows as the square root of its frequency. He derived this relationship from two assumptions: that words follow Zipf’s law for word frequencies (a power law dependency between frequency and rank) and Zipf’s law of meaning distribution (a power law...
متن کاملRandom texts exhibit Zipf's-law-like word frequency distribution
It is shown that the distribution of word frequencies for randomly generated texts is very similar to Zipf's law observed in natural languages such as the English. The facts that the frequency of occurrence of a word is almost an inverse power law function of its rank and the exponent of this inverse power law is very close to 1 are largely due to the transformation from the word's length to it...
متن کاملUnderstanding Zipf's law of word frequencies through sample-space collapse in sentence formation
The formation of sentences is a highly structured and history-dependent process. The probability of using a specific word in a sentence strongly depends on the 'history' of word usage earlier in that sentence. We study a simple history-dependent model of text generation assuming that the sample-space of word usage reduces along sentence formation, on average. We first show that the model explai...
متن کاملCan simple models explain Zipf's law for all exponents?
H. Simon proposed a simple stochastic process for explaining Zipf’s law for word frequencies. Here we introduce two similar generalizations of Simon’s model that cover the same range of exponents as the standard Simon model. The mathematical approach followed minimizes the amount of mathematical background needed for deriving the exponent, compared to previous approaches to the standard Simon’s...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Complexity
دوره 21 شماره
صفحات -
تاریخ انتشار 2016